Over the last years, topic modeling has emerged as a powerful technique for organizing and summarizing big collections of documents or searching for particular patterns in them. However, privacy concerns arise when cross-analyzing data from different sources is required. Federated topic modeling solves this issue by allowing multiple parties to jointly train a topic model without sharing their data. While several federated approximations of classical topic models do exist, no research has been carried out on their application for neural topic models. To fill this gap, we propose and analyze a federated implementation based on state-of-the-art neural topic modeling implementations, showing its benefits when there is a diversity of topics across the nodes' documents and the need to build a joint model. Our approach is by construction theoretically and in practice equivalent to a centralized approach but preserves the privacy of the nodes.
translated by 谷歌翻译
Quantitative cancer image analysis relies on the accurate delineation of tumours, a very specialised and time-consuming task. For this reason, methods for automated segmentation of tumours in medical imaging have been extensively developed in recent years, being Computed Tomography one of the most popular imaging modalities explored. However, the large amount of 3D voxels in a typical scan is prohibitive for the entire volume to be analysed at once in conventional hardware. To overcome this issue, the processes of downsampling and/or resampling are generally implemented when using traditional convolutional neural networks in medical imaging. In this paper, we propose a new methodology that introduces a process of sparsification of the input images and submanifold sparse convolutional networks as an alternative to downsampling. As a proof of concept, we applied this new methodology to Computed Tomography images of renal cancer patients, obtaining performances of segmentations of kidneys and tumours competitive with previous methods (~84.6% Dice similarity coefficient), while achieving a significant improvement in computation time (2-3 min per training epoch).
translated by 谷歌翻译
Some recent pieces of work in the Machine Learning (ML) literature have demonstrated the usefulness of assessing which observations are hardest to have their label predicted accurately. By identifying such instances, one may inspect whether they have any quality issues that should be addressed. Learning strategies based on the difficulty level of the observations can also be devised. This paper presents a set of meta-features that aim at characterizing which instances of a dataset are hardest to have their label predicted accurately and why they are so, aka instance hardness measures. Both classification and regression problems are considered. Synthetic datasets with different levels of complexity are built and analyzed. A Python package containing all implementations is also provided.
translated by 谷歌翻译
自动图像分析中的不确定性定量在许多应用中高度满足。通常,分类或细分中的机器学习模型仅用于提供二进制答案。但是,量化模型的不确定性可能在主动学习或机器人类互动中起关键作用。当使用基于深度学习的模型时,不确定性量化尤其困难,这是许多成像应用中最新的。当前的不确定性量化方法在高维实际问题中不能很好地扩展。可扩展的解决方案通常依赖于具有不同随机种子的相同模型的推理或训练集合过程中的经典技术,以获得后验分布。在本文中,我们表明这些方法无法近似分类概率。相反,我们提出了一个可扩展和直观的框架来校准深度学习模型的合奏,以产生近似分类概率的不确定性定量测量。在看不见的测试数据上,我们证明了与标准方法进行比较时的校准,灵敏度(三种情况中的两种)以及精度。我们进一步激发了我们在积极学习中的方法的用法,创建了伪标签,以从未标记的图像和人机合作中学习。
translated by 谷歌翻译
葡萄牙人战士(PMW)是一种凝胶生物体,具有长长的触手,能够造成严重的燃烧,从而导致对人类活动(例如旅游和捕鱼)的负面影响。缺乏有关该物种的时空动力学的信息。因此,使用替代方法收集数据可以有助于其监视。鉴于社交网络的广泛使用和PMW的引人注目的外观,Instagram帖子可能是监视的有前途的数据源。遵循此方法的第一个任务是识别指向PMW的帖子。本文报告了使用卷积神经网络进行PMW图像分类,以自动识别Instagram帖子。我们创建了一个合适的数据集,并训练了三个不同的神经网络:VGG-16,RESNET50和InceptionV3,并在Imagenet数据集中进行了预先训练的步骤。我们使用准确性,精度,召回和F1评分指标分析了他们的结果。预先训练的RESNET50网络提供了最佳结果,获得了94%的精度和95%的精度,召回和F1分数。这些结果表明,卷积神经网络对于识别Instagram社交媒体的PMW图像非常有效。
translated by 谷歌翻译
深度学习的繁荣激发了渴望整合这两个领域的计算流体动力学的研究人员和实践者。PINN(物理信息神经网络)方法就是这样的尝试。尽管文献中的大多数报告都显示出应用PINN方法的积极结果,但我们对其进行了实验扼杀了这种乐观。这项工作介绍了我们使用PINN解决两个基本流量问题的不成功的故事:2D Taylor-Green Vortex at $ re = 100 $ = 100 $和2D缸流,$ re re = 200 $。 Pinn方法解决了2D Taylor-Green涡流问题,并以可接受的结果为基础,我们将这种流程作为精度和性能基准。 Pinn方法的准确性需要大约32个小时的训练,以使$ 16 \ times 16 $有限差异模拟的准确性不到20秒。另一方面,2D气缸流甚至没有导致物理溶液。 Pinn方法的表现像稳态的求解器,没有捕获涡流脱落现象。通过分享我们的经验,我们要强调的是,Pinn方法仍然是一种正在进行的工作。需要更多的工作来使Pinn对于现实世界中的问题可行。
translated by 谷歌翻译
在过去十年中,发光二极管(LED)几乎在每个应用中都取代了常见的灯泡,从智能手机中的手电筒到汽车前灯。照亮夜间街道需要LED发出光谱,被人眼被人眼被视为纯白色。与这种白光谱相关的电力不仅分布在贡献波长上,而且在视觉角度上分布。对于许多应用,可用的光线需要在向前的方向上退出LED,即在小角度到垂直。在这项工作中,我们证明了白色LED顶部的专门设计的多层薄膜增加了向前发射的纯白光的功率。因此,推导的多目标优化问题是通过实质物理引导的目标函数重新重新制定,该函数代表了我们工程问题的层次结构。采用贝叶斯优化的变体基于射线跟踪模拟来最大化这种非确定性目标函数。最终,对合适的多层薄膜的光学性质的研究允许识别白光方向性的增加的机制:角度和波长选择性过滤导致多层薄膜与光线的乒乓球发挥作用。
translated by 谷歌翻译
人工智能(AI)为简化Covid-19诊断提供了有前景的替代。然而,涉及周围的安全和可信度的担忧阻碍了大规模代表性的医学数据,对临床实践中训练广泛的模型造成了相当大的挑战。为了解决这个问题,我们启动了统一的CT-Covid AI诊断计划(UCADI),其中AI模型可以在没有数据共享的联合学习框架(FL)下在每个主机机构下分发和独立地在没有数据共享的情况下在每个主机机构上执行。在这里,我们认为我们的FL模型通过大的产量(中国测试敏感性/特异性:0.973 / 0.951,英国:0.730 / 0.942),与专业放射科医师的面板实现可比性表现。我们进一步评估了持有的模型(从另外两家医院收集,留出FL)和异构(用造影材料获取)数据,提供了模型所做的决策的视觉解释,并分析了模型之间的权衡联邦培训过程中的性能和沟通成本。我们的研究基于来自位于中国和英国的23家医院的3,336名患者的9,573次胸部计算断层扫描扫描(CTS)。统称,我们的工作提出了利用联邦学习的潜在保留了数字健康的前景。
translated by 谷歌翻译